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Abstract 

Suppose we have a signal y which we wish to represent using a linear combination of a number 
of basis atoms a i; y = ^\ a^a, = Ax. The problem of finding the minimum £ norm representation 
for y is a hard problem. The Basis Pursuit (BP) approach proposes to find the minimum l\ norm 
representation instead, which corresponds to a linear program (LP) that can be solved using modern 
+^ , LP techniques, and several recent authors have given conditions for the BP (minimum l\ norm) 

and sparse (minimum £q solutions) representations to be identical. In this paper, we explore this 
s« / [ sparse representation problem using the geometry of convex polytopes, as recently introduced into 

■ the field by Donoho. By considering the dual LP we find that the so-called polar polytope P* of 

the centrally-symmetric polytope P whose vertices are the atom pairs ±a$ is particularly helpful 
in providing us with geometrical insight into optimality conditions given by Fuchs and Tropp for 
non-unit-norm atom sets. In exploring this geometry we are able to tighten some of these earlier 
results, showing for example that the Fuchs condition is both necessary and sufficient for £i-unique- 
optimality, and that there are situations where Orthogonal Matching Pursuit (OMP) can eventually 
find all €i-unique-optimal solutions with to nonzeros even if ERC fails for to, if allowed to run for 
more than to steps. 
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I. Introduction 
in . 

Suppose we have a vector y = [y x , . . . , y d ] T which we wish to represent using a linear combination 
from n nonzero (^-dimensional basis atoms a i; y = J^XjELj. In other words, we wish to find an 
n-vector } T such that y = Ax, where A = [a*] is the d x n matrix whose ith column 

is aj. Unless specified otherwise, the vectors aj are not required to be unit norm, i.e. ||aj|| 2 7^ 1 in 
general. In the special case where the aj are unit norm, we may call A a dictionary [1]. 

We consider the case where we have more atoms a ; 6 A than observation dimensions, n > 
d, and there are therefore many possible representations Ax = y for a given A and y. The 
sparse representation problem is then to find the representation x with the fewest possible non-zero 
components, 

min||x|| such that Ax = y (PO) 

where ||x|| is the £ Q norm of x, i.e. the number of non-zero elements. This is well known to be a 
hard problem [2]. 

In the signal processing community, Chen, Donoho and Saunders [2] proposed to approximate 
(PO) with the 'relaxed' i\ problem 

minHxl^ such that Ax = y (PI) 

X 
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where Hx^ = Yli \ x i\ * s the £\ norm of x. Problem (PI), which they called Basis Pursuit (BP), 
can be formulated as a linear programming (LP) problem, which can be solved using well known 
optimization methods such as the simplex method or interior point methods [3]. They observed 
experimentally that the solution to (PI) often found a 'good' sparse representation for y, and gave 
examples where it produced better results than the greedy algorithms Matching Pursuit (MP) [1] 
or Orthogonal Matching Pursuit (OMP) [4]. 

Subsequently a number of authors have explored the conditions under which the minimum of 
(PI) is unique and identical to the minimum of (PO), sometimes called exact recovery or £i/£o 
equivalence. For example, for dictionaries of unit norm atoms, Donoho and Huo [5] showed that if 
A is the union of a pair of orthonormal 'time' and 'frequency' (spike and Fourier) bases, so that 
n = 2d, then £\/£q equivalence holds for a representation y = Ax if x has m = ||x|| < \\fd~ 
nonzeros. With M = max^- | (a^, a fc )| defined to be the coherence of the dictionary, they also 
showed that £i/£o equivalence holds if m < |(1 + M" 1 ) [5]. Elad and Bruckstein [6] improved this 
bound to m < (y/2 — 0.5)M _1 = 0.9142M -1 for a pair of orthonormal bases, and Donoho and 
Elad [7] and independently Gribonval and Nielsen [8] generalized these bounds for more general 
dictionaries of non-orthogonal unit-norm vectors. 

A. Recovery conditions on general, non-unit-norm atom sets 

In this paper we will consider the more general case of non-unit-norm atom sets. In the longer 
term we are interested in learning appropriate atom sets and may not want to constrain these to 
be unit norm. Also, for the purposes of the current paper, the usual unit-norm requirement on 
atoms means that the d = 2 case is somewhat 'too well behaved', making construction of simple 
2D visualizations more difficult than necessary. 

For clarity it can be helpful to decompose £±/£o equivalence for a representation x into two 
separate conditions: 

1. xo is the unique optimum to (PO) (£o-unique-optimality) 

2. x is the unique optimum to (PI) (^-unique-optimality) 

To show £±/£o equivalence for a given x it is sufficient to show both x satisfies both £ -unique- 
optimality and ^-unique-optimality. To show £±/£o equivalence for a set of representations, it is 
sufficient to show both conditions hold for all representations x in that set. Let us deal with 
£ -unique-optimality first. 

We define the Spark of a matrix, a = Spark(A), to be the smallest number such that there exists 
a subset of a columns from A that are linearly dependent [7]. Given a matrix A e M dxri with n > d, 
if all subsets of d columns from A are linearly independent, then Spark(A) = d + 1. 

Theorem 1.1 (Donoho and Elad [7, Corollary 1]: £q- Uniqueness) A representation y = Axo with 
m = ||x || nonzeros is £ -unique-optimal (i.e. the sparsest possible representation) if m < Spark(A)/2 

In particular this means that if all subsets of d columns of A e lR dxri are linearly independent, 
then Theorem 1.1 holds with m < (d+ l)/2. Consequently the combination of £i-unique-optimality 
and m < Spark(A)/2 is sufficient to show £i/£o equivalence. In the remainder of this paper we will 
therefore concentrate on the condition of £i-unique-optimality. 

For the case of general (non-unit-norm) sets of real atoms, authors including Tropp [9] and Fuchs 
[10] have derived conditions for £i-unique-optimality. We begin with the condition introduced 
recently by Fuchs [10]. For some y represented by a linear combination of m < d atoms in A, let 
xo be the desired solution of y = Axo to be recovered, with m = ||xo|| non-zero elements. 

Theorem 1.2 (Fuchs Condition [10, Theorem 4]) Let x opt be the m-dimensional vector built from 
the nonzero components of x , with A opt the n x m matrix built from the corresponding columns 
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of A such that y = A p t x opt = Ax . If A opt is full rank, and there exists some c e W 1 satisfying 

A o P t c = signx opt (1) 
|ajc| < 1 for any a, G A, a,- £ A opt (2) 

then Xo is the unique solution to (PI). 

This means that if y = Ax is a sparse representation of y such that the conditions in Theorem 
1.2 hold, then Basis Pursuit (BP) will find this sparse representation. For an extension of Theorem 
1.2 to the complex domain see Tropp [11]. 

Using c = A opt signx opt in Theorem 1.2, where A opt is the Moore-Penrose pseudoinverse of A opt , 
we obtain the following result (introduced originally in [12]): 

Corollary 1.3 (Fuchs Corollary [12]) Let x opt and A opt be given as in Theorem 1.2. If A opt is full 
rank, and 

|ajA opt sign x opt | < 1 for any a,- e A, a,- ^ A opt (3) 

then x is the unique solution to (PI). 

The conditions involved in Theorem 1.2 and Corollary 1.3 seem at first somewhat awkward to 
visualize, in that they involve the sign of x opt as well as its support [13], [11]. However, we shall 
show in this paper that they corresponds to finding points c on a particular geometrical object, the 
polar polytope, whose vertices and faces correspond to signed support basis sets. We shall also show 
that the condition in Theorem 1.2 is the weakest possible, in that it is both necessary and sufficient 
for fx-unique-optimality. 

Perhaps more well known than the Fuchs condition above is the Exact Recovery Condition (ERC) 
introduced by Tropp [9]. 

Theorem 1. 4 (Tropp [9]: Exact Recovery Condition) Let us have x and A opt as in Theorem 1.2 
above. If 

< 1 (4) 



max 

aj^Aopt 



A 1 " a 



where &j ranges over the atoms in A which are not in the m-term representation of y, then xo is 
the unique solution to (PI). 

Hence a representation y = A opt x opt can be recovered by BP whenever (4) is satisfied. The quantity 
max a^A opt A opt a.j is referred to as the exact recovery coefficient. 

Tropp [9] also showed that (4) guarantees that the Orthogonal Matching Pursuit (OMP) algorithm 
will find the solution x in m steps. This condition also applies for exponential convergence of 
ordinary matching pursuit (MP) to the solution x [14]. 

Although the approaches of Fuchs [10] and Tropp [9] are very different, Gribonval and Nielsen 
[13] pointed that they are closely linked. Specifically we have 

max max | sign(x^ pt )A opt a i | = max max |(sign(x opt ), A^a^l = max A* t a,- (5) 

x pt aj^Aopt x °pt aj^Aopt aj^A op t 1 

so the Exact Recovery Condition (Theorem 1.4) is itself a corollary of the Fuchs Corollary (Corollary 
1.3). Thus ERC is a stronger condition than the Fuchs Condition (Theorem 1.2), and there are in 
fact cases where OMP will not give the same solution as BP. 

In an interesting new direction, Donoho [15], [16] has explored the link between sparse recovery 
and the geometry of polytopes, convex sets defined by a finite set of vertices or inequalities. Donoho 
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showed that l\/ Iq equivalence of certain representations xo can be linked to the existence of partic- 
ular faces of a polytope P whose vertices are the atom pairs ia* with aj G A. If each subset of k 
signed atoms forms the vertices of a true face of P, (i.e. P is /c-neighbourly) then £i/£o equivalence 
holds for all representations x with at most k nonzeros. 

This powerful new approach means that results from the field of polytopes can be brought across 
to the sparse representations problem, and vice versa. For example, using the classic work of 
McMullen and Shephard [17] on centrally symmetric polytopes, Donoho showed [15, Corollary 1.3] 
the surprising result that for n — 2 > d > 2, the condition k < [(d + 1)/3J must hold for £±/£o 
equivalence of all representations x having at most k nonzeros. 

The structure of this paper is as follows. In section II we introduce some polytope notation and 
discuss the polytope approach to the sparse representation problem. In section III we consider the 
dual LP problem and the corresponding polar (dual) polytope and its visualization. In section IV 
we investigate the Fuchs Condition and its geometry on the polar polytope, and link this to the 
Donoho results on the primal polytope. In the subsequent sections we apply this approach to the 
Fuchs Corollary and the Exact Recovery Condition, and consider the special geometry of unit norm 
dictionaries. Finally we consider Matching Pursuit algorithms, before drawing our Conclusions. For 
ease of visualization we will use real geometry in this paper, so all our matrices and vectors will be 
real. 

II. Polytopes and sparse recovery 

We will develop some low-dimensional examples to illustrate the recovery conditions described 
above. We will see that, even in 2 dimensions, we can gain considerable insight into the geometric 
meaning of these recovery conditions. First let us define some terminology (see e.g. [18]). 

Recall that a set S C M. d is convex if it contains all line segments connecting any pair of points 
in S, i.e. x,y G S implies tx + (1 — t)y G S for all < t < 1. A point x G S is called an extreme 
point if it cannot be represented as a convex combination of two other points in S. The convex hull, 
convX, of a subset X C M. d is the smallest convex set containing X. The affine hull, aff X, of a set 
of points Xi G X is the set of affine combinations x = A^j for reals Aj > 0. A set of points is 
said to be affinely independent if none of the points Xj can be represented by an affine combination 
of the other points. 

A convex polytope is a bounded subset of M, d that is the set of solutions to a finite system of 
linear inequalities. We normally omit the qualifier convex. For example, given an d x n matrix A 
and a (i-vector y, the set P = {x | Ax < y} is a polytope if it is bounded: the notation Ax < y 
means af x < y$ for all i, where is a row of A and yi is the corresponding element of y. (Without 
the boundedness condition, P would be a polyhedron.) We refer to a rf-dimensional polytope as a 
d-polytope. A simplex is the simplest type of polytope, and is the d- dimensional convex hull of some 
d + 1 affinely independent points: we can call this a d-simplex. 

A linear inequality a T x < b is called valid for a polytope P if it holds for all elements of P. A 
subset F of P is called a face of P if F = or F = P (the improper faces) , or 

F = P n {x I a T x = b} 

for some valid inequality a T x < b with scalar b. Faces of dimension 0, and d — 1 are called vertices 
and facets, respectively. The vertices are also the extreme points of P. Faces of dimension k are 
called fc-faces: these correspond to subsets where (at least) d — k of the inequalities {&J x < y,j} 
hold with equality. Faces of a polytope are themselves polytopes. 
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There are two different ways to represent a polytope P: by inequalities or by vertices. If using 
inequalities, each inequality defines a halfspace Hi = {y | afx < yi\ and P is therefore the intersec- 
tion of all the relevant halfspaces P = flji/j. This is called the H-representation for P. Alternatively, 
we can use the set of vertices V = {v±, . . . ,v p } so that the polytope P = conv{f!, . . . ,v p } is the 
convex hull of the set of vertices V: this is called the V -representation for P. Converting from 
H-representation to V-representation is called the vertex enumeration problem, while converting 
from V-representation to H-representation is called the convex hull problem (or facet enumeration 
problem) [19]. 

1.5 
1 

0.5 
x U 
-0.5 
-1 
-1.5 

-2-1 1 2 

X 1 

Fig. 1. Polytope in two dimensions 

To summarize some of this terminology, see Fig. 1. The 2-polytope P has been specified based 
on its vertices (0-faces) vi, . . . ,v±. We can visually verify that the polytope P is the convex hull 
conv{w i, . . . , f 4 } generated by the vertices of the polytope. The polytope is also defined by halfspaces 
such as Hi2- these are shown as dotted lines, with H± 2 indicating the half of R d included in H 12 , 
and the half not included in B.\2- 

In fact, Fig. 1 illustrates a specific type of polytope called a centrally symmetric polytope. A 
polytope is centrally symmetric if it is symmetric about the origin O, i.e. x e P — x e P. 

Specifically this means that its vertices come in opposite-sign pairs (vi,—Vi), and the inequalities 
defining the halfspaces also come in opposite-sign pairs. Thus if the inequality a T x < b is valid 
for P, then the negative version — a T x < b must also be valid. Centrally-symmetric polytopes are 
particularly useful for our consideration of sparse coding. 

A. Neighbourliness and sparse recovery 

Now let us form the centrally symmetric polytope P whose 2n vertices are the positive and 
negative versions of the basis vectors ±aj in our atom matrix A. We say that the columns of A are 
in general position (in this context of defining the vertices of a centrally-symmetric polytope) if all 
subsets of d columns of A are linearly independent (so Spark(A) = d + 1). 

A centrally-symmetric polytope P is called k-neighbourly if every subset of k vertices of P, which 
does not contain two opposite vertices of P, are the vertices of a (k — l)-simplex which is a face 
of P. In other words, for each of the ( \ ) x 2 k ways we can choose a set of k basis vectors a.j and 
signs Oj E { — 1, +1}, if these k vectors are the vertices of a (k — l)-dimensional face of P, then P 
is /c-neighbourly. 
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Theorem II. 1 (Donoho [15, Theorem 1]) Let P be the polytope whose 2n vertices are the positive 
and negative atoms ia* with Oj G A. Then P is /c-neighbourly if and only if every solution x to 
y = Ax with at most k nonzeros is the unique solution to (PI). 

In other words if P is /^-neighbourly, then BP will find all sparse representations x with ||x || < k, 
i.e. x has at most k nonzero elements. Results from the theory of convex polytopes [17] then give 
us e.g. k < [(d + 1)/3J if n > d + 2,d > 2. 
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Fig. 2. Neighbourly and non-neighbourly polytopes. 

Let us give a visualization of this property in 2 dimensions. In Fig. 2(a) we have n = 2 dictionary 
vectors ai and a 2 in d = 2 dimensions. Firstly, we see that the polytope PI has all 2n = 4 vertices. 
It is also trivially /c-neighbourly for k — 1, since all (f ) x 2 1 = 4 ways of choosing a single vertex 
are the vertices themselves, and hence faces of P. For k = 2, we can list all ( \ ) x 2 2 = 4 sets of two 
vertices (excluding antipodal pairs): these are (a 1? a 2 ), (a 2 , — ai), (— ai, — a 2 ) and (— a 2 ,a 1 ). We 
can see that each of these vertex pairs are the two vertices of a 1-face (edge) of P x : the 1-faces are 
simply the line segments between the selected pair of vertices. 

In Fig. 2(b) however we have n = 3 dictionary vectors a 1; a 2 , a 3 , although still in d = 2 dimensions. 
All 2n = 6 vertices are present, and hence it is again 1-neighbourly. However, there are ( |) x 2 2 = 12 
ways to choose two vertices, but P 2 has only 6 vertices, so it is not 2-neighbourly. For example, while 
the vertex pairs (a 1; a 2 ) and (a 1? — a 3 ) form 1-faces of P 2 , the vertex pairs (ai, a 3 ) and (ai, — a 2 ) do 
not. Intuitively, we might expect that any y which is composed of a positive linear combination of 
ai and a 3 will be unable to be recovered using the linear program (PI). To gain further insight into 
this process, we next introduce a dual polytope that corresponds to the dual LP of (PI). 

III. Primal-Dual Geometry of Sparse Recovery 

Authors such as Chen, Donoho and Saunders [2] and Fuchs [10] have pointed out that the linear 
program (PI) has a corresponding dual linear program [20], [21] 



max c T y 

c 



subject to 



: T A| 



oo<l 



(6) 



such that for any optimal solution x opt to (PI) there must be a corresponding optimal solution 
c op t to (6) and this will have the same cost c^ pt y = Hx^t^. The inequality condition in (6) 
can be rewritten ||c T A|| < 1 = |c T aj| < 1 for all a^ e A, or alternatively +afc < 1 and 
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T 

-a; c 



Therefore this dual linear program (6) defines a second polytope 
< 1 for all aj G A} over the space of c associated with our dual 



< 1 for all aj G A. 
Q = {c| + afc < 1, -afc 
optimization problem. 

To formalize this, we need a little more terminology (for details see e.g. [18]). Any polytope P can 
be associated with a dual polytope Q where each fc-face of P is associated with a (d — k — l)-face 
of Q. Hence each vertex (0-face) of P corresponds to a facet ((k — l)-face) of Q. Suppose we 
have a polytope P with vertices v$ G V. The polytope P* = {y | vjy < l,Vj G V} is known 
as the po/ar polytope of P. If P is a polytope that contains the origin in its interior, then P* is 
also a polytope, and (P*)* = P. Furthermore, vertices, facets, and general /c-faces of P* are in a 
one-to-one correspondence with the facets, vertices, and (n — k — l)-faces of P, respectively. 

Hence the dual polytope specifying the feasible region for c in (6) is simply the polar polytope 
P* of our original polytope P whose vertices are the basis vector pairs ±a* with a; G A. Fig. 3 



(a) 



(b) 
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Fig. 3. Primal (a) and polar dual (b) polytopes corresponding to the atom set A = {±ai,±a2} 

illustrates this for the set of basis vectors ±A = {ai, a2, — ai, — a2}. The facets of the polar (dual) 
polytope (Fig. 3(b)) are along the hyperplanes ajy = 1. The vectors ±a| shown on the polar 
polytope figure are scaled versions of the atoms defined by ±aj = =ba^/ 1| || 2- We notice that the a\ 
touch the supporting hyperplanes of the dual polytope P* since aja\ = afaj/||aj||2 = 1, and that 
a\ is the (transpose of the) Moore-Penrose pseudo-inverse of a^. In this particular example we have 
chosen a unit length atom for a x , | ai | = 1 , so that a\ = a x . 

We can also construct a polar polytope for a subset of atoms, although we have to be slightly 
careful in this case. If we choose m < d atoms to generate our primal polytope, it only occupies 
at most an m-dimensional subspace of M. d , and its polar polyhedron (unbounded polytope) extends 
to infinity. To avoid this problem we instead introduce the concept of a relative polar polytope P* 
for an m-dimensional polytopes with m < d to be the intersection of the polar polyhedron with the 
afline hull of the vertices of P (i.e. the subspace occupied by the vertices of P). This is therefore 
the m-dimensional polar polytope P* generated if we considered P and P* both to be restricted to 
the m-dimensional subspace that P occupies. In what follows, where it is clear from context, we 
will simply use 'polar polytope' to refer to the relative polar polytope. 
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A. Primal-dual solution correspondence 

If we have a solution to the dual linear program (6) we can find the corresponding solution to the 
primal linear program (PI) using complementary slackness. To simplify this we will first reformulate 
(PI) and (6) into their equivalent standard form. 

Let x = (xi, . . . ,x 2n ) T be the nonnegative vector 

{max(ii, 0) 1 < i < n 

max(— Xj_ n , 0) n + 1 < i < 2n 

and let A = [A, —A] be the corresponding doubled matrix. Any solution to Ax = y can be written 
in the form Ax = y with nonnegative x. Using this notation we have = l T x so we can write 
the primal and dual problems (PI) and (6) respectively as 

min l T x such that Ax = y and x > (8) 

X 

maxy T c such that A T c < 1. (9) 

c 

Then the complementary slackness of these linear programs gives us the following lemma immedi- 
ately [21, p95]: 

Lemma 111.1: Suppose that x and c are optimal in (8) and (9). If a component of x in (8) is 
positive, Xi > 0, then we must have equality &J c = 1 for the corresponding inequality in (9). 
Therefore for a given solution c opt we can identify the possible positive elements of x by identifying 
the atoms for which af c opt = 1. 

B. Brute force algorithm for optimization of (PI) 

It is a standard result from linear programming that the optimum of the linear function is obtained 
at one (or more) of the extreme points [21]. This therefore leads to the following (conceptual) 'brute 
force' algorithm for minimizing the l\ norm (PI): 

1. Enumerate the set V of the vertices of the polar polytope P* = {c | A T c < 1} 

2. Search over V to find c opt = argmaxcg^ c T y. 

3. Recover A opt from c opt and solve for x opt = A~p t y. 

We could then recover the basis set A opt corresponding to c opt , since we have a^ ^ A opt if &J c opt < 1 
and we consider the remaining rows (for which af c opt = 1) to be in A opt . Nonsingular A opt would 
indicate non-unique x opt , or no solution. Alternatively, if we save the basis sets during vertex 
enumeration at step 1, A opt can be recovered more directly. If there were a subspace of optimal 
solutions for c which maximize c T y then some of the recovered components of x opt will in fact be 
zero. 

Now, this algorithm is not meant to be a practical one, particularly since step 1 requires the 
solution of the vertex enumeration problem. The number of vertices of a polynomial can increase 
very quickly with the number of facets, and the computational and storage complexity of vertex 
enumeration algorithms can also be very high [19]. 

Nevertheless, it is interesting that this algorithm is very reminiscent of a clustering algorithm, as if 
the vertices Vj of P* are cluster target vectors, and we wish to associate y with the 'cluster' (vertex) 
which 'best matches' (has largest dot product with) the target. This may give us a natural way to 
connect sparse coding with the ICA Mixture Model, which selects between possible representation 
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basis sets depending on the region occupied by y [22]. Consider also that in many cases the 
system (PI) is to be solved many times for different observations y. In this case, it would be 
possible to 'cache' previous known vertices c and their associated basis sets A opt , to use as a set of 
starting points for new solutions. Observations close to those already found would then be solved 
immediately, simply requiring a check for optimality. 

C. Visualizing the primal-dual solution correspondence 




Fig. 4. Primal-dual solution correspondence 

Let us consider the subsets of {y G R d } which give particular vertices of P* and the corresponding 
representation basis sets (Fig. 4). In Fig. 4(a) the shaded region R ++ denotes a cone in y-space 
represented by nonnegative amounts x±,X2 > of the basis vectors +ai,+a 2 . This segment is 
bounded by the half-rays in the direction of the corresponding basis vectors. It is straightforward 
to verify that the dot product c^ + y of any y G R++ with the vertex c ++ will be larger than the dot 
product with any other vertex Cj, and hence for any y G R++, c ++ is the point within the polytope 
that maximizes c T y, as required by the dual linear program (6) or its standard form (9). 

Stretching notation slightly, we may refer to the vertex of the (relative) polar polytope P* that 
corresponds to a particular active basis simply set as the vertex of that basis set: hence we say 
that c ++ is the vertex of the basis set {+ai, +a2}. In simple cases we find that the vertex cj is 
contained within the corresponding cone Rj, but this is not necessary. For example in Fig. 4(b) we 
see that c ++ is not contained in the cone R++: in this case we may say that the basis set has an 
external vertex. 

Finally, consider now the observation y = /3(+a.i) for some /3 > 0, which has the optimal solution 
x = (/3,0) corresponding to x = (f3, 0,0, 0). The quantity c T y is maximized for any c along the 
edge joining c ++ and c + _, i.e. any c G conv{c ++ , c + _}. Our brute force algorithm would enumerate 
the vertices, so would select either c opt = c ++ or c opt = c + _, and hence determine A opt = [+ai, +a 2 ] 
or A' opt = a 2 ] respectively. But in either case, we can confirm that solving for x opt would 

give x opt = A~\y = ((3, 0, 0, 0) so recovering the desired solution. 

IV. The Fuchs Condition 

In its original form, the Fuchs Condition (Theorem 1.2) seems difficult to interpret (see e.g. com- 
ments in [13], [11]). However, if we convert it into its equivalent 'standard form' (in LP terminology) 
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in terms of nonnegative x then we can relate it more clearly to our polytope geometry. First however 
we give the Fuchs Condition in its 'standard form', and show that it is the weakest possible condition 
for sparse recovery, in that it is both necessary and sufficient for (PI) to find a particular solution to 
(P0). In what follows we form x from x using (7) together with the corresponding doubled matrix 
A=[A,A]. 

Theorem IV. 1 (Fuchs Condition in standard form) Let x be a solution of Ax = y, x > 0. Let 
x opt be the m- dimensional vector built from the nonzero components of xo, with A opt the In x m 
matrix built from the corresponding columns of A, such that y = A opt x opt = Ax . Then x is the 
unique optimum point of (8) if and only if A opt has full rank and there exists some c such that 

ajc =1 &j G A opt (10) 
ajc<l a,- i A opt (11) 

where &j ranges over the columns of A. 

Proof: For the 'if direction, the set of feasible solutions c to (9) must satisfy ajc < 1 for all 

a.j G A. Complementary slackness states that the following two statements are equivalent [21]: 

1. x and c are optimum solutions of (8) and (9) 

2. if a component x j of x is positive, then the corresponding inequality ajc < 1 is satisfied 
with equality, i.e. ajc = 1. 

Now the basis vectors aj G A opt are those for which X{ > 0. Therefore the condition ajc = 1 for 
a.j G A opt is sufficient to specify that x must be an optimum of (8) and c must be an optimum of 
(9). 

Complementary slackness also gives us that for optimum solutions x and c, if an equation ajc < 1 
is satisfied with strict inequality, ajc < 1, then the corresponding component Xi of x must be zero. 
Therefore the condition ajc < 1 for a, ^ A opt requires that any optimal solution x to (8) must 

have zero components Xj = corresponding to a, ^ A opt . Therefore since A opt is full rank, the 
optimal solution is unique and is given by x opt = A^y. 

For the converse, suppose first that A opt does not have full rank. Then there is a linear subspace 
of possible solutions for x opt satisfying A op tX pt = y. Therefore another solution x opt would exist 
with smaller or identical cost l T x opt so x opt could not be the unique minimum. Hence if x opt is the 
unique minimum, then A opt must have full rank. 

For the other conditions, we have a feasible solution x to (8) and we know that c = is a feasible 
solution to equations (6) and (8) since A T = < 1 so both the primal and dual linear programs 
have a solution. Since xo is an optimum of (8) then there must be at least one optimum solution 
c of (9). By complementary slackness, for any i with Xi > and hence aj G A opt we must have 
ajc = 1 for any optimum c. Furthermore, if x is the unique optimum, then there is no optimum 
solution with Xi > with corresponding vector a« ^ A opt so there must be a solution, say Cj for 
which ajci < 1. Any convex combination of these optimal solutions c« must also be a optimal 
solution so let us choose e.g. c' = mean{cj|afcj < 1}. Then c' is an optimum and ajc' < 1 for all 
a» ^ A opt . We have therefore constructed a c which satisfies the required conditions. ■ 

Let us verify that this is equivalent to the Fuchs Condition. 

Lemma IV. 2: A vector c satisfies the Fuchs Condition in Theorem 1.2 if and only if it satisfies 
the condition in Theorem IV. 1. 

Proof: First we note that A opt and A opt contain identical columns expect for sign changes so 
the full rank condition on each is equivalent. 
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For the other conditions in Theorem 1.2, for a, G A opt , for which [x opt ]j 7^ 0, we have ajc = 
sign([x opt ]j). If [xopt]^ > we get ajc = 1 and -ajc = -1 < 1 so a, = G A opt , a n+j = 
— a.j (jz A opt . Alternatively if [x opt ]j < we get ajc = — 1 < 1 and —ajc = 1 so a,- = a.j A opt , 
a n+J - = — a.j G A opt . For a.j ^ A opt , we have |ajc| < 1 so —1 < ajc < 1, i.e. —ajc < 1 and 
+ajc < 1, thus 3&+j c < 1 an d a Jc < 1, so a^+j A opt and a, ^ A opt . 

Showing the converse is similarly straightforward, noting that ajc = 1 and —ajc = 1 can never 
both be satisfied at once. ■ 

From this equivalence we immediately get the following result. 

Corollary IV. 3: The Fuchs Condition (Theorem 1.2) is both necessary and sufficient for a given 
Xo to be the unique minimum of (PI). 

Looking at the Fuchs Condition, we see that in standard form (Theorem IV. 1) it only depends 
on A opt , or in original form (Theorem 1.2) on A opt and the signs of x opt . Thus the following follows 
immediately. 

Theorem IV. 4: The condition for x to be the unique minimum of (8) depends only on the support 
of x . Or equivalently: The condition for x to be the unique minimum of (PI) depends only on 
the support of x and signs of x on its support. 

Proof: The support of x determines A opt and hence both the rank of A opt and existence 
of c in the Fuchs condition in the standard form (Theorem IV. 1). The support and signs of xo 
determines the support of x . ■ 

As noted by Donoho [15] this 'discreteness of individual equivalence' has been observed by previ- 
ous authors [5], [23]. It means for instance that if a particular x is the unique optimal solution to 
(PI) with y = Ax , then all x' with the same support and signs will also be the respective unique 
optimal solution to (PI) with y = Ax'. 

A. Geometry of the Fuchs condition 

Let us examine a geometrical interpretation of the preceding theorems in terms of the polar 
polytope P* we introduced earlier. 

Theorem IV. 5: Suppose that A opt has full rank. The solution Xo with m nonzeros in Theorem 
IV. 1, is the unique optimum point of (8) if and only if the polar rf-polytope given by P* = {c | 
A T c < 1} has a (d — m) -dimensional face F * pt = {c G P*\A^ pt c = 1} specified by the m additional 
equalities A^c = 1 

Proof: For < m < d, the conditions in Theorem IV. 1 are equivalent to the requirement for 
c to be in the relative interior of the (d — m)-face F Q * pt = {c G P*|A opt c = 1}. Therefore such a c 
exists if and only if the face exists and is nondegenerate. For m = d the conditions are equivalent 
to c being exactly the vertex (0-face) c = (A~ p \) T l. ■ 
Consequently the Fuchs condition in either its original form (Theorem 1.2) or its standard form 
(Theorem IV. 1) corresponds to the existence of the (d — m)-dimensional face of P* in Theorem 
IV. 5, since for a c to exist it must be in the relative interior of that face (for m < d) or be the the 
single vertex point (for m = d). 

B. Visualizing the Fuchs Condition 

Let us return to Fig. 4, with xo = ((3,0), [3 > in each of Fig. 4(a) and (b). In both figures we 
have F* pt = conv{c ++ , c + _} which is the line joining c ++ to c + _. Therefore the Fuchs Condition 
(Theorem 1.2 and Theorem IV. 1) is satisfied by any c in the relative interior of this line, c G 
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relint F * pt , i.e. any point on the line joining c ++ to c + _ except for the end points c ++ and c + _ 
themselves. 

We notice in passing that aj G relint F opt in Fig. 4(a) but not in Fig. 4(b), so c = a{ satisfies the 
Fuchs Condition in the first case but not the second. We shall see later that this will distinguish 
the Fuchs Condition from the Fuchs Corollary (Corollary 1.3). 

C. Relationship to the primal polytope 

Now P* is the polar (dual) of the primal polytope P with vertices ia*, a; G A. Therefore the 
(d — m)-face of the polar polytope F* t = {c£ P*|A opt c = 1}, which we might call the dual face, 

corresponds to the (to — l)-face F opt — P n conv{a,- G A opt } of the primal polytope P, i.e. the 
corresponding primal face [18]. The dual face on P* exists and is nondegenerate if and only if 
the primal face on P exists and is a simplex. Therefore we have the following result, echoing the 
individual equivalence results of Donoho [15]: 

Theorem IV. 6: Let x be a solution of Ax = y, x > 0, with to nonzeros, and let x opt and A opt be 
constructed as before. Then x is the unique optimum point of (8) if and only if F opt = conv{a,- G 
A opt } is an (to — l)-face of P. 

Proof: This follows immediately from the preceding arguments, once we note that A opt has 
full rank if and only if F opt = conv{a,- G A opt } has dimension (to — 1) and all a^ G A opt are nonzero. 

■ 

To summarize, for a given solution x to Ax = y with to = ||x || nonzeros to be £i-unique- 
optimal, or equivalently for the nonnegative solution x to Ax = y, to be £i-unique-optimal, we 
have the following equivalent conditions: 

1. Fuchs Condition in the standard form (Theorem IV. 1) 

2. Fuchs Condition in the original form (Theorem 1.2) 

3. Existence of nondegenerate dual (d — ra)-face F* pt of P* (Theorem IV. 5) 

4. Existence of primal face F opt of P which is an (to — l)-simplex (Theorem IV. 6) 
Furthermore any c that satisfies the Fuchs Condition (Theorem IV. 1 or Theorem 1.2) is contained 
in the relative interior of the dual face F* pt of P*. 

To use our approach to confirm the main result of Donoho [15], suppose P is /c-neighbourly. Then 
all representations y = A opt x opt with to < k nonzeros have a face F opt of the centrally-symmetric 
primal polytope P which is an (to — l)-simplex. Therefore the Fuchs condition is satisfied for all 
x with at most k nonzeros, and we have £i-unique-optimality. Note that we have not required 
the assumption of general position of the columns of A: the requirement of /c-neighbourliness of 
the centrally symmetric P is sufficient to require linear independence of the columns of all optimal 
submatrices A opt with at most to columns, which requires Spark(A) > to. Finally for £i/£q- 
equivalence we simply need to add the stronger condition to < Spark(A)/2, so if P is /c-neighbourly 
then we have £i /^-equivalence if to < min(fc, Spark(A)/2 — 1). 

V. Fuchs Corollary 

Let us write down an equivalent of the stronger Fuchs Corollary (Corollary 1.3) in the standard 
form. 

Corollary V.l (Fuchs Corollary in standard form) For a desired solution x to Ax = y, let us 
construct x opt and A opt as before. If A opt has full rank and 



af c opt < 1 for all a, G A, a.j <£ A opt 



(12) 
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is satisfied with the specific dual vector c opt = Ao pt T l, then x is the unique optimum to (9). 
The dual vector c opt = A.l pt r l is the vertex of our (signed) basis set A opt . 

From our geometric viewpoint, the Fuchs Corollary requires that the dual face F* t = {c G 
P*|A opt c = 1} corresponding to the signed optimal basis A opt exists (as for the Fuchs Condition), 
and additionally that the basis vertex c opt = Aj, pt T l is contained in its relative interior, c opt G 
relintF * pt . 

From a practical point of view, one advantage of the Fuchs Corollary over the Fuchs Condition 
is that it is easier to test. The probe point c opt can be constructed directly from x and A, while 
testing the Fuchs Condition would require the relevant face of P* to be found. 



A. Visualizing the Fuchs Corollary 

Consider again Fig. 4 with x = (f3, 0), f3 > 0. Here we have A opt = [ai] and hence A.l pt T = [+&{] 
so our basis vertex is given by c opt = A.l pt T l = +af • 1 = a{. Since F^ pt = conv{c ++ , c + _} which 
is the line segment joining c ++ to c + __, clearly c opt G relint F opt in Fig. 4(a), but c opt ^ relintF* pt 
in Fig. 4(b). Therefore, while the Fuchs Condition (Theorem 1.2 and Theorem IV. 1) is satisfied for 
x = (f3,0) in both Fig. 4(a) and (b), the Fuchs Corollary (Corollary 1.3) is only satisfied for this 
x in Fig. 4(a). This confirms that the Fuchs Corollary is indeed strictly stronger than the Fuchs 
Condition (see also [10]). 

VI. Exact Recovery Condition 

We saw in the Introduction that the Exact Recovery Condition (Theorem 1.4) of Tropp [9] can 
be derived as a corollary of the Fuchs Corollary (Corollary 1.3). To gain geometrical insight, it is 
helpful for us to state this in the following way: 

Lemma VI. 1: Suppose we have a desired solution x to y = Ax . Then the Exact Recovery 
Condition (Theorem 1.4) is satisfied if the Fuchs Corollary (Corollary 1.3) is satisfied for all Xq with 
the same support as x , including solutions Xq with the same support but different signs. 

Proof: This follows from (5) (see [13]). ■ 

From discreteness of the unique minimum condition (Theorem IV. 4) we only have to test a finite 
number (2 m ) separate conditions to check all of the different signs on a support of m nonzeros. (In 
fact since those with entirely reversed signs will have identical results, we only need 2 m_1 tests.) 

One way to see this is to explicitly construct the set of basis vertices {c = A„ t signx opt } that 
will need to be tested. To do this, let us construct the signs o~j G {+1, —1} for j = 1, . . . ,m and 
form the sign vector a = [a±, . . . ,cr m ] T G {+1, — l} m . Then the set of basis vertices we need to 

test is V* pt = {c = Al pt a} which clearly has 2 m elements. ERC (Theorem 1.4) will therefore be 
satisfied if 

ajc < 1 for all c G V* pt , a,- A opt (13) 

from which it is clear that each of our 2 m 'tests' will in fact require (n — m) dot product calculations 
each. For dictionaries of unit-norm atoms other measures such as the mutual coherence M = 
maxj^j |afaj| can give us more practical conditions that guarantee ERC is satisfied, such as m < 
|(1 + M- 1 ) [9], [13]. 

A. Geometry of the Exact Recovery Condition 

To turn the preceding condition (13) for ERC into a geometric visualization, we can realize 
that c = Aj pt a is the vector in the span of the columns of A opt which satisfies A^ pt c = a i.e. 
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diag((x)AQ Pt c = 1, or in other words i^ajc = 1 for a, G A opt and some combination of signs ±j. 
Hence V* t is actually the set of 2 m vertices of the relative polar polytope P* t whose corresponding 
primal polytope P opt has the 2m vertices ±a 3 -, &j G A opt . We call P opt the primal basis polytope 
and P Q * pt the c?uaZ frasis polytope. 

Consequently ERC is satisfied if and only if (a) the dual basis polytope P* pt is contained within 
the complete polar polytope P*, P* pt C P*, and (b) P D * pt does not touch any face of P* for which 
±ajc = 1 for some a, ^ A opt for full rank A opt . 

B. Visualizing the Exact Recovery Condition 

Consider again Fig. 4 with x = ((3,0), (3 > 0. Here we have A opt = [a x ] so our primal basis 
polytope is given by P opt = conv{— ai, +ai}. The relative polar polytope is given by P* pt = {c G 
aff P opt |c T a < 1 for all a G P op t} where aff P opt is the affine hull of P op t- I n this case we get 
P* pt = conv{— a}, +a}} so P* pt is the line segment joining — a{ and +a|. In Fig. 4(a) we can see 
that P* t C P* and P* pt is well away from the faces along +a^c = 1 (joining c_+ to c ++ ) and 

— afc = 1 (joining c to c + _). Hence ERC is satisfied in Fig. 4(a). However, in Fig. 4(b) we can 

see that P opt (jL P* so ERC is not satisfied. 

If we repeat this analysis for some x with A opt = [a 2 ], we see that P* t = conv{— &\, +a 2 } so 
P* pt C P*, and P* pt is away from the other faces, in both Fig. 4(a) and (b), and hence ERC is 
satisfied for both. Similarly for some x with A opt = [ai,a 2 ], we now have P* pt = P* so clearly 
P* pt C P*, and there are no a, ^ A opt to concern ourselves with. Hence ERC is again satisfied for 
both Fig. 4(a) and (b). 

This illustrates that it is possible for ERC to be satisfied for all xo with m nonzeros (here m = 2), 
but not satisfied for x with k < m nonzeros (e.g. k = 1 and x = (f3,0) in Fig. 4(b)). This is 
in contrast to the Fuchs Condition where the property of neighbourliness tells us that if the Fuchs 
Condition is satisfied for all x with m nonzeros, then it will be satisfied for any x with k < m 
nonzeros [15]. 

VII. Unit-norm dictionaries 

Many of the equivalence results of previous authors are for dictionaries of unit norm atoms 
|aj| = 1. The fact that a| = a^ leads immediately to a number of special properties, under the 
assumption that the atoms are distinct: 

1. Any unit-norm dictionary has all In vertices; 

2. ERC is satisfied for any 1-term (singleton) representation; 

3. In d = 2 all basis vertices are internal; 

4. Any centrally symmetric 2-polytope with 4 vertices is 2-neighbourly. 

The simple proofs of these properties are left as an exercise for the reader. While these can be useful 
properties, for visualization purposes it means we have to work harder to find examples illustrating 
the distinction between ERC and the Fuchs Condition. Nevertheless, let us explore what happens 
with the following basis set 

ai = [l,0,0] T (14) 
a 2 = [0,l,0] T (15) 
a 3 = (l/v / 3)[l,l,l] T (16) 
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to form the matrix A = {±a^|i = 1, 2, 3}. Suppose that our desired vector to recover is xo = [1, 1, 0] T 
so that y = Axo = ai + a2. Therefore the optimal basis set that we would like to recover given y 

is A opt = [ai, a 2 ], which has vertex c opt = A^ pt 1 = [1, 1, 0] T . 

Consider first the Exact Recovery Condition. ERC requires that HA^aa^ = c^ pt a 3 < 1 but 

calculation gives c^ pt a 3 = 2/\/3 > 1 so ERC fails for this basis. We can see this graphically in 
Fig. 5. The shaded cone in Fig. 5(b) shows the segment of the plane spanned by {ax, a 2 } for which 

(a) (b) 




a^c > maxj =lj2 af c. Here we see that the vertex c opt = c ++0 is in this shaded region (Fig. 5(b)), 
and has been 'cut off' by the halfspace a^ c < I. 

As confirmation of this, the dual basis polytope P* pt is the square in the plane x 3 = with 
vertices at [±1,±1,0]. We can see that the corner containing [1,1,0] (= c ++0 ) is not contained 
within the full dual polytope, so P* pt <f_ P*, and hence ERC is not satisfied. 

However, we can identify vectors suitable to satisfy the Fuchs Condition. For example, consider 
the point Cp = [1, 1, — 2] T marked in Fig. 5(a). We can verify that Cpa! = Cpa 2 = 1, and |cpa 3 | = 
|(1 + 1 — 2)/y/3\ = < 1 therefore the Fuchs Condition is satisfied. In fact the relevant dual face 
is F* pt = conv{c +++ , c ++ _} so any c G relint F * pt , i.e. anywhere along the line segment strictly 
between c +++ and c ++ _, will be suitable to satisfy the Fuchs Condition. 

Finally if we consider the Fuchs Corollary, this requires c opt = Al pt 1 = c ++0 to be contained in 
F opt . This is clearly not the case, since c ++0 ^ P* and F* t is itself a face of P*, so F* t C P* and 
therefore c ++0 ^ F* t . Therefore the Fuchs Corollary is not satisfied. 
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Consequently any desired solution xq = /?2, 0] with 0i, 02 > will be recovered by Basis 
Pursuit, even though ERC and the Fuchs Corollary fails. Note however that visual inspection of 
Fig. 5(a) will confirm that both the Fuchs Condition and the Fuchs Corollary would be satisfied 
for e.g. x = [p 1: —#2,0] with /3i, (3 2 > 0, even though ERC must still fail since the support of the 
desired solution is unchanged. 

VIII. Matching Pursuit Algorithms 

While we have seen that Tropp's ERC is sufficient but not necessary for £i-unique-optimality, it 
really comes into its own for orthogonal matching pursuit (OMP), as is clear from Tropp [9]: 

Theorem VIII. 1 (Tropp: Exact Recovery for OMP) Suppose we have a desired solution x for 
y = Axo with full rank A opt as in Theorem 1.4. Then Orthogonal Matching Pursuit (OMP) will 
recover x in m steps if the Exact Recovery Condition (4) holds. Conversely, suppose ERC fails for 
some y = Ax with optimal synthesis matrix A opt . Then there are signals in the column span of 
A opt which Orthogonal Matching Pursuit cannot recover in m steps. 

Proof: For the forward direction see [9]. For the converse, choose the signal y = c opt = 
(A opt ) T l, for which ajy = 1 for all slj G A opt . If ERC fails there exists some a, ^ A opt for 
which ajy > 1 = max aig A opt af y. Therefore OMP may choose this &j ^ A opt at the first step 
(and certainly will if ajy > 1). Since we have now used up one step, and it must take at least 
m more steps to obtain the correct representation for y, OMP cannot obtain the correct m-term 
representation in m steps. ■ 

Recovery 'in m steps' is implicit in Tropp's statement of this theorem. However, given that ERC 
for all desired vectors x with k nonzeros does not imply ERC holds for all vectors x with m < k 
nonzeros, it may still be possible for OMP to recover the m-term representation in some k > m 
steps, provided that OMP is eventually allowed to drop any zeros in the final representation. 

As an example, consider the situation illustrated in Fig. 4(b), where we have a x = [1,0] T and 
a 2 = [y/2, \/2] T . Suppose we wish to recover the signal x = [1,0] T from y = Ax = [1,0] T for 
which A opt = [ai]. Investigating ERC we find A opt a 2 = afa 2 = \pl > 1 so ERC fails, confirming 
our earlier discussion. 

But let us run OMP to see what happens. In step 1, OMP chooses the wrong atom a 2 , as we 
now expect, so A*- 1 - 1 = [a 2 ]. Choosing x 2 to minimize the mean squared error we get x^ = a 2 y = 
[1 / (2\/2)] producing a reconstruction 

y(i) = x (i) a2 = (1/(2^2)) x [y/2, v / 2] T = [0.5, 0.5] and residual 
= y — y^ = [0.5, — 0.5] T . So as expected, OMP has not recovered x = [1, 0] T in m — 1 steps. 
But if we allow OMP to run for a second step, we find ajr^ = 0.5 while a^r^ 1 ^ = as we would 
expect for OMP. Hence in step 2, OMP chooses the remaining basis &i so A^ 2 -* = [a 1? a 2 ] (reordering 
the atoms for convenience). Now choosing x = [xi,x 2 ] to minimize the mean squared error we get 
x^ 2 ) = [^i 2 \^ 2 2 ^] = (A"l~)( 2 )y = [1,0] T producing a reconstruction y( 2 ) = x^A^ = y and r = 0. 
Since x^ = 0, OMP has found the correct 1-term reconstruction of y, albeit taking 2 steps to do 
so. 

Thus failure of ERC does not require that OMP will fail, only that it cannot succeed in m steps. 
We can therefore state the following weaker condition for eventual recovery by OMP. 

Theorem VIII. 2: Suppose that x with m nonzeros is a desired solution of y = Ax which fails 
ERC. Suppose further that there exists a different solution y 1 = Ax 1 for which ERC is satisfied, 
and which covers x in the sense that the support of x x is a superset of the support of x . Then 
OMP will 'eventually' recover x in mi steps, where mi > m is the number of nonzeros in x x 
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Proof: This follows from the proof of Theorem VIII. 1, but considering xo to be the desired 
solution within the extended support given by Xi. ■ 
At present it is unclear whether it is common for ERC to fail at one level m but be satisfied at 
higher levels m\ > m , so it remains to be seen whether this concept of eventual convergence of 
OMP will turn out to be useful. 

IX. Conclusions 

We have explored the geometry of the sparse representation problem using centrally-symmetric 
polytopes and polar (dual) polytopes. We have seen that polytopes can give us a useful insight into 
the optimality conditions introduced by Fuchs, for example, which had previously been considered 
to be difficult to interpret. 

In exploring this geometry we have also been able to tighten some of these previous results, and 
link these to the polytope-based results of Donoho for the primal polytope. For example, we showed 
that the Fuchs Condition is both necessary and sufficient for £i-unique-optimality, and that there 
are situations where Orthogonal Matching Pursuit (OMP) can find all £i-unique-optimal solutions 
with m nonzeros, even if the Exact Recovery Condition (ERC) fails for m, if it is allowed to run 
for additional steps. 
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